Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new API to write multiscales metadata #149

Merged
merged 3 commits into from
Dec 17, 2021

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Dec 16, 2021

While investigating the testing requirements for #148, I found that my primary need are API helper methods allowing to write NGFF metadata e.g. write_plate_metadata, write_well_metadata and generate test data

As a first step towards these APIs, this PR refactors the existing multiscales generation code and extract the multiscales metadata addition logic into a separate API write_multiscales_metadata.
A direct application of this method is the scenario where a library would have its own implementation of the data writing but would still like to write the metadata using the core API. An example is omero-cli-zarr where https://github.com/ome/omero-cli-zarr/blob/7589cc47920b912480b3e6daa282c3d773c2bfdf/src/omero_zarr/raw_pixels.py#L276-L289 could be updated to consume this API.

In addition to the new API, the existing internal axes validation method is split into two, separating the validation of the axes against the array dimensions and the inference of axes values for 2D/5D cases from the strict validation of the axes using the allowed values.

A set of unit tests are also added to cover the different scenarios.

- Split the axes validation logic from the axes guessing logic so it can
be re-used
- Add tests covering the scenarios of format 0.1/0.2 as well as invalid axes
@codecov
Copy link

codecov bot commented Dec 16, 2021

Codecov Report

Merging #149 (a2195e5) into master (5c43ff5) will increase coverage by 0.24%.
The diff coverage is 95.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #149      +/-   ##
==========================================
+ Coverage   71.04%   71.28%   +0.24%     
==========================================
  Files          11       11              
  Lines        1126     1132       +6     
==========================================
+ Hits          800      807       +7     
+ Misses        326      325       -1     
Impacted Files Coverage Δ
ome_zarr/writer.py 95.00% <95.83%> (+1.75%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5c43ff5...a2195e5. Read the comment docs.

@will-moore
Copy link
Member

Looks good. I don't know if I'd guess the different behaviour of _validate_axes() and _validate_axes_names() from their names, but naming things is hard.
All of this will change soon for v0.4 anyway. I imagined that we might use _validate_axes(axes: List[Dict]) to validate axes objects (in v0.4) whereas validation of their names with _validate_axes_names(axes: List[str]), but I don't know if it will make sense to separate the logic in the same way for v0.4...?

@sbesson
Copy link
Member Author

sbesson commented Dec 16, 2021

@will-moore thanks for picking that up. Yes the name is not the greatest and I was also pondering how this would evolve with the upcoming axes specification.

As we decided to keep this API private, we can certainly iterate on names etc. At least for 0.3, I identified three different behaviors:

  • validate the value of the axes i.e. is a subset of TCZYX?
  • validate the consistency of the axes in relationship to the data i.e. are the dimensions consistent?
  • infer unspecified axes in some simple scenarios e.g. for 2D or 5D data

I think the spirit of 0c71fbe is to separate these different functionalities.

How do you expect the changes in 0.4 will affect this API? One thought is that the axis guessing based on the dimensionality might become irrelevant. But are the other behaviors that different? Or is it only the input type which will primarily vary?

@will-moore
Copy link
Member

Yes, as this is a private API, it's OK to go with what we have here and update for 0.4.
I think for v0.4 we could still guess axes for 2D, maybe not for 5D.
The input type will certainly vary, at least for _validate_axes() since axes will be Dicts.

@sbesson
Copy link
Member Author

sbesson commented Dec 17, 2021

Thanks both. Unless you can feel a particular need for a patch release, I'd propose to get this into the main development branch and bump the version to 0.3.0.dev0.

I have started working on similar APIs for plate/well metadata. My goal is to help to construct the unit tests allowing to test #148 and build the framework for the support of ome/ngff#24.

Said otherwise, I am starting to see a 0.3.0 roadmap emerging with a Jan 2022 timeline including:

  1. new metadata writing API (multiscales, plate, well)
  2. the fix for HCS with <5D data
  3. anything else?

@sbesson sbesson merged commit 7916a95 into ome:master Dec 17, 2021
@sbesson sbesson deleted the write_multiscales_metadata branch December 17, 2021 21:31
@jburel
Copy link
Member

jburel commented Dec 18, 2021

Since the ome_zarr_test_suite is triggered on merge, we should wait for the test suite to finish before tagging in case there is an issue.
It is green see https://github.com/ome/ome_zarr_test_suite/actions/runs/1594165903 but we never know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants